Decision Trees and Attribute-Oriented Knowledge Discovery in Databases
نویسنده
چکیده
We compare two types of knowledge representations for data mining: decision trees and knowledge rules, and three mining methods. We show that the decision tree representation and the knowledge rules representation are semantically equivalent. The ID3 algorithm by Quinlan has limited generalization power because it uses only a very simple generalization function of partial attribute removal type. ID3 may output too many knowledge rules. Quinlan's production rule generators use attribute removal functions that are more powerful than ID3. The HCC-algorithm by Han, Cai, and Cercone uses both attribute removal and concept tree ascension as its generalization functions. HCC-algorithm normally has more generalization power than the other two. HCC-algorithm also uses thresholds to control the output complexity. On computational eeciency, HCC-algorithm is the most eecient, ID3 is the next, while Quinlan's production rule generators are less eecient. We also point out some disadvantages of HCC-algorithm. It is possible to combine known generalization functions and strategies to form hybrid algorithms. For various applications, hybrid algorithms have advantages over the known methods. We propose a hybrid algorithm which combines all generalization functions and at the same time avoids over-generalization carefully. The proposed algorithm is stable, complete, and non-local and is about as eecient as Quinlan's production rule generators.
منابع مشابه
Attribute-oriented Induction in Ob Ject-oriented Databases
Knowledge discovery in databases is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data such that the extracted knowledge may facilitate deductive reasoning and query processing in database systems. This branch of study has been ranked among the most promising topics for database research for the 1990s. Due to the dominating influence of relat...
متن کاملKnowledge Discovery in Databases: An Attribute-Oriented Approach
Knowledge discovery in databases, or data mining, is an important issue in the development of dataand knowledge-base systems. An attribute-oriented induction method has been developed for knowledge discovery in databases. The method integrates a machine learning paradigm, especially learning-from-examples techniques, with set-oriented database operations and extracts generalized data from actua...
متن کاملA Heuristic for Evaluating Databases for Knowledge Discovery with DBLEARN
We propose a heuristic method for choosing databases for attempting knowledge discovery. The DBLEARN knowledge-discovery program uses an attribute-oriented inductive-inference method to discover potentially signiicant relations in a database. A concept forest deenes the possible generalizations that DBLEARN can make for a database. The concept forest consists of trees, each of which represents ...
متن کاملKnowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملKnowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007